{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Notebook 4.1: ChatGLM2-6B\n",
    "\n",
    "## 4.1.1 概述\n",
    "本示例展示了如何使用[BigDL-LLM](https://github.com/intel-analytics/BigDL/tree/main/python/llm) API 在低成本 PC 上（无需独立显卡）运行[ChatGLM2-6B](https://github.com/THUDM/ChatGLM2-6B) 中文推理。ChatGLM2-6B 是 [THUDM](https://github.com/THUDM) 提出的开源双语（汉英）聊天模型 [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) 的第二代版本。ChatGLM2-6B 也可在以下[链接](https://huggingface.co/THUDM/chatglm2-6b)中的[Huggingface models](https://huggingface.co/models)中找到。\n",
    "\n",
    "在进行推理之前，您可能需要根据[第二章](../ch_2_Environment_Setup/README.md)的说明配置环境。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1.2 安装\n",
    "\n",
    "首先，在准备好的环境中安装 BigDL-LLM。有关环境配置的最佳实践，请参阅本教程的 [第二章](../ch_2_Environment_Setup/README.md)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --pre --upgrade bigdl-llm[all]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "all 选项用于安装 BigDL-LLM 所需的其他软件包。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1.3 加载模型以及 Tokenizer\n",
    "\n",
    "### 4.1.3.1 加载模型\n",
    "\n",
    "使用 BigDL-LLM API 加载低精度优化（INT4）的 ChatGLM2 模型以降低资源成本，这会将模型中的相关层转为 INT4 格式。\n",
    "\n",
    "> **注意**\n",
    ">\n",
    "> BigDL-LLM 支持 `AutoModel`、`AutoModelForCausalLM`、`AutoModelForSpeechSeq2Seq` 和 `AutoModelForSeq2SeqLM`。前缀带有 Auto 的这些类可帮助用户自动读取相关模型，因此，我们只需使用 `AutoModel` 加载即可。\n",
    "\n",
    "> **注意**\n",
    ">\n",
    "> 您可以指定参数 `model_path` 为 Huggingface repo id 或本地模型路径。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bigdl.llm.transformers import AutoModel\n",
    "\n",
    "model_path = \"THUDM/chatglm2-6b\"\n",
    "model = AutoModel.from_pretrained(model_path,\n",
    "                                  load_in_4bit=True,\n",
    "                                  trust_remote_code=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.3.2 加载 Tokenizer\n",
    "\n",
    "LLM 推理也需要一个 tokenizer。它用于将输入文本编码为张量，然后输入 LLM，并将 LLM 输出的张量解码为文本。您可以使用 [Huggingface transformers](https://huggingface.co/docs/transformers/index) API 直接加载 tokenizer。它可以与 BigDL-LLM 加载的模型无缝配合使用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_path,\n",
    "                                          trust_remote_code=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1.4 推理\n",
    "\n",
    "### 4.1.4.1 创建 Prompt 模板\n",
    "\n",
    "在生成之前，您需要创建一个 prompt 模板。这里我们给出了一个用于提问和回答的 prompt 模板示例，参考自 [ChatGLM2-6B prompt template](https://huggingface.co/THUDM/chatglm2-6b/blob/main/modeling_chatglm.py#L1007)。您也可以根据自己的模型调整 prompt。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "CHATGLM_V2_PROMPT_TEMPLATE = \"问：{prompt}\\n\\n答：\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.4.2 生成\n",
    "\n",
    "接下来，您可以使用加载的模型与 tokenizer 生成输出。\n",
    "\n",
    "> **注意**\n",
    "> \n",
    ">  `generate` 函数中的 `max_new_tokens` 参数定义了预测的最大 token 数量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-------------------- Output --------------------\n",
      "问：AI是什么？\n",
      "\n",
      "答： AI指的是人工智能,是一种能够通过学习和推理来执行任务的计算机程序。它可以模仿人类的思维方式,做出类似人类的决策,并且具有自主学习、自我\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "import torch\n",
    "\n",
    "prompt = \"AI是什么？\"\n",
    "n_predict = 32\n",
    "\n",
    "with torch.inference_mode():\n",
    "    prompt = CHATGLM_V2_PROMPT_TEMPLATE.format(prompt=prompt)\n",
    "    input_ids = tokenizer.encode(prompt, return_tensors=\"pt\")\n",
    "    output = model.generate(input_ids,\n",
    "                            max_new_tokens=n_predict)\n",
    "    output_str = tokenizer.decode(output[0], skip_special_tokens=True)\n",
    "    print('-'*20, 'Output', '-'*20)\n",
    "    print(output_str)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.4.3 流式对话\n",
    "\n",
    "ChatGLM2-6B 支持流式输出函数 `stream_chat`，它使模型能够逐字提供流式响应。但是，其他模型可能不提供类似的 API，如果您想实现一般的流式输出功能，请参阅 [5.1 章](../ch_5_AppDev_Intermediate/5_1_ChatBot.ipynb)。\n",
    "\n",
    "> **注意**\n",
    ">\n",
    "> 为了成功观察到标准输出中的文本流，我们需要设置环境变量 `PYTHONUNBUFFERED=1` 以确保标准输出流直接发送到终端，而无需先进行缓冲。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-------------------- Stream Chat Output --------------------\n",
      "AI指的是人工智能,是一种能够通过学习和理解数据,以及应用适当的算法和数学模型,来执行与人类智能相似的任务的计算机程序。AI可以包括机器学习、自然语言处理、计算机视觉、专家系统、强化学习等不同类型的技术。\n",
      "\n",
      "AI的应用领域广泛,例如自然语言处理可用于语音识别、机器翻译、情感分析等;计算机视觉可用于人脸识别、图像识别、自动驾驶等;机器学习可用于预测、分类、聚类等数据分析任务。\n",
      "\n",
      "AI是一种非常有前途的技术,已经在许多领域产生了积极的影响,并随着技术的不断进步,将继续为我们的生活和工作带来更多的便利和改变。"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "with torch.inference_mode():\n",
    "    question = \"AI 是什么？\"\n",
    "    response_ = \"\"\n",
    "    print('-'*20, 'Stream Chat Output', '-'*20)\n",
    "    for response, history in model.stream_chat(tokenizer, question, history=[]):\n",
    "        print(response.replace(response_, \"\"), end=\"\")\n",
    "        response_ = response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1.5 在 LangChain 中使用\n",
    "\n",
    "[LangChain](https://python.langchain.com/docs/get_started/introduction.html)是一个被广泛的用于开发由语言模型驱动的应用程序的框架。本节将介绍如何将 BigDL-LLM 与 LangChain 集成。您可以按照此[说明](https://python.langchain.com/docs/get_started/installation)为 LangChain 配置环境。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果需要的话，请按如下方式安装 LangChain，或者参考 [第八章](../ch_8_AppDev_Advanced/README.md) 获取更多有关 LangChain 集成的信息："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -U langchain==0.0.248"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **注意**\n",
    "> \n",
    "> 我们建议使用 `langchain==0.0.248`，这个版本在我们的教程中不会出现问题。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.5.1 创建 Prompt 模板\n",
    "\n",
    "在推理之前，您需要创建一个 prompt 模板。这里我们给出了一个用于提问和回答的 prompt 模板示例，其中包含两个输入变量：`history` 和 `human_input`。您也可以根据自己的模型调整 prompt 模板。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "CHATGLM_V2_LANGCHAIN_PROMPT_TEMPLATE = \"\"\"{history}\\n\\n问：{human_input}\\n\\n答：\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.5.2 准备 Chain"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 [LangChain API](https://api.python.langchain.com/en/latest/api_reference.html) `LLMChain` 构造用于推理的 chain。在这里，我们使用 BigDL-LLM API 来构造一个 `LLM` 对象，它将自动加载经过低精度优化的模型。\n",
    "\n",
    "> **注意**\n",
    ">\n",
    "> `ConversationBufferWindowMemory` 是 LangChain 中的一种存储类型，用于保存一个装有对话中最近 `k` 次交互的移动窗口。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain import LLMChain, PromptTemplate\n",
    "from bigdl.llm.langchain.llms import TransformersLLM\n",
    "from langchain.memory import ConversationBufferWindowMemory\n",
    "\n",
    "llm_model_path = \"THUDM/chatglm2-6b\" # huggingface llm 模型的路径\n",
    "\n",
    "prompt = PromptTemplate(input_variables=[\"history\", \"human_input\"], template=CHATGLM_V2_LANGCHAIN_PROMPT_TEMPLATE)\n",
    "max_new_tokens = 128\n",
    "\n",
    "llm = TransformersLLM.from_model_id(\n",
    "        model_id=llm_model_path,\n",
    "        model_kwargs={\"trust_remote_code\": True},\n",
    ")\n",
    "\n",
    "# 以下代码与用例完全相同\n",
    "llm_chain = LLMChain(\n",
    "    llm=llm,\n",
    "    prompt=prompt,\n",
    "    verbose=True,\n",
    "    llm_kwargs={\"max_new_tokens\":max_new_tokens},\n",
    "    memory=ConversationBufferWindowMemory(k=2),\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1.5.3 生成"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3m\n",
      "\n",
      "问：AI 是什么？\n",
      "\n",
      "答：\u001b[0m\n",
      "AI指的是人工智能,是一种能够通过学习和理解数据,以及应用数学、逻辑、推理等知识,来实现与人类智能相似或超越人类智能的计算机系统。AI可以分为弱人工智能和强人工智能。弱人工智能是指一种只能完成特定任务的AI系统,比如语音识别或图像识别等;而强人工智能则是一种具有与人类智能相同或超越人类智能的AI系统,可以像人类一样思考、学习和理解世界。目前,AI的应用领域已经涵盖了诸如自然语言处理、计算机视觉、机器学习、深度学习、自动驾驶、医疗健康等多个领域。\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n"
     ]
    }
   ],
   "source": [
    "text = \"AI 是什么？\"\n",
    "response_text = llm_chain.run(human_input=text,stop=\"\\n\\n\")\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cn-eval",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
